67 research outputs found
A WCET-aware cache coloring technique for reducing interference in real-time systems
International audienceThe time predictability of a system is the condition to give safe and precise bounds on the worst-case execution time of real-time functionalities which are running on it. Commercial off-the-shelf(COTS) processors are increasingly used in embedded systems and contain shared cache memory. This component has a hard predictable behavior because its state depends on the execution history of the systems. To increase the predictability of COTS component we use cache coloring, a technique widely used to partition cache memory. Our main contribution is a WCET aware heuristic which partition task according to the needs of each task. Our experiments are made with CPLEX an ILP solver with random tasks set generated running on preemptive system scheduled with earliest deadline first(EDF)
Grassroots Operator Search for Model Edge Adaptation
Hardware-aware Neural Architecture Search (HW-NAS) is increasingly being used
to design efficient deep learning architectures. An efficient and flexible
search space is crucial to the success of HW-NAS. Current approaches focus on
designing a macro-architecture and searching for the architecture's
hyperparameters based on a set of possible values. This approach is biased by
the expertise of deep learning (DL) engineers and standard modeling approaches.
In this paper, we present a Grassroots Operator Search (GOS) methodology. Our
HW-NAS adapts a given model for edge devices by searching for efficient
operator replacement. We express each operator as a set of mathematical
instructions that capture its behavior. The mathematical instructions are then
used as the basis for searching and selecting efficient replacement operators
that maintain the accuracy of the original model while reducing computational
complexity. Our approach is grassroots since it relies on the mathematical
foundations to construct new and efficient operators for DL architectures. We
demonstrate on various DL models, that our method consistently outperforms the
original models on two edge devices, namely Redmi Note 7S and Raspberry Pi3,
with a minimum of 2.2x speedup while maintaining high accuracy. Additionally,
we showcase a use case of our GOS approach in pulse rate estimation on
wristband devices, where we achieve state-of-the-art performance, while
maintaining reduced computational complexity, demonstrating the effectiveness
of our approach in practical applications
HyT-NAS: Hybrid Transformers Neural Architecture Search for Edge Devices
Vision Transformers have enabled recent attention-based Deep Learning (DL)
architectures to achieve remarkable results in Computer Vision (CV) tasks.
However, due to the extensive computational resources required, these
architectures are rarely implemented on resource-constrained platforms. Current
research investigates hybrid handcrafted convolution-based and attention-based
models for CV tasks such as image classification and object detection. In this
paper, we propose HyT-NAS, an efficient Hardware-aware Neural Architecture
Search (HW-NAS) including hybrid architectures targeting vision tasks on tiny
devices. HyT-NAS improves state-of-the-art HW-NAS by enriching the search space
and enhancing the search strategy as well as the performance predictors. Our
experiments show that HyT-NAS achieves a similar hypervolume with less than ~5x
training evaluations. Our resulting architecture outperforms MLPerf MobileNetV1
by 6.3% accuracy improvement with 3.5x less number of parameters on Visual Wake
Words.Comment: CODAI 2022 Workshop - Embedded System Week (ESWeek
Harmonic-NAS: Hardware-Aware Multimodal Neural Architecture Search on Resource-constrained Devices
The recent surge of interest surrounding Multimodal Neural Networks (MM-NN)
is attributed to their ability to effectively process and integrate multiscale
information from diverse data sources. MM-NNs extract and fuse features from
multiple modalities using adequate unimodal backbones and specific fusion
networks. Although this helps strengthen the multimodal information
representation, designing such networks is labor-intensive. It requires tuning
the architectural parameters of the unimodal backbones, choosing the fusing
point, and selecting the operations for fusion. Furthermore, multimodality AI
is emerging as a cutting-edge option in Internet of Things (IoT) systems where
inference latency and energy consumption are critical metrics in addition to
accuracy. In this paper, we propose Harmonic-NAS, a framework for the joint
optimization of unimodal backbones and multimodal fusion networks with hardware
awareness on resource-constrained devices. Harmonic-NAS involves a two-tier
optimization approach for the unimodal backbone architectures and fusion
strategy and operators. By incorporating the hardware dimension into the
optimization, evaluation results on various devices and multimodal datasets
have demonstrated the superiority of Harmonic-NAS over state-of-the-art
approaches achieving up to 10.9% accuracy improvement, 1.91x latency reduction,
and 2.14x energy efficiency gain.Comment: Accepted to the 15th Asian Conference on Machine Learning (ACML 2023
Performance Evaluation and Design Tradeoffs of On-Chip Interconnect Architectures
Network-on-Chip (NoC) has been proposed as an alternative to bus-based schemes to achieve high performance and scalability in System-on-Chip (SoC) design. Performance analysis and evaluation of on-chip interconnect architectures are widely based on simulations, which become computationally expensive, especially for large-scale NoCs. In this paper, a Network Calculusbased methodology is presented to analyze and evaluate the performance and cost metrics, such as latency and energy consumption. The 2D Mesh, Spidergong and WK-recursive on-chip interconnect architectures are analyzed using this methodology and results are compared with those produced using simulations. The values obtained by simulations and by analysis show similar trends in the same order of magnitude. Furthermore, WK outperforms the other on-chip interconnects in all considered metric
FLASH-RL: Federated Learning Addressing System and Static Heterogeneity using Reinforcement Learning
Federated Learning (FL) has emerged as a promising Machine Learning paradigm,
enabling multiple users to collaboratively train a shared model while
preserving their local data. To minimize computing and communication costs
associated with parameter transfer, it is common practice in FL to select a
subset of clients in each training round. This selection must consider both
system and static heterogeneity. Therefore, we propose FLASH-RL, a framework
that utilizes Double Deep QLearning (DDQL) to address both system and static
heterogeneity in FL. FLASH-RL introduces a new reputation-based utility
function to evaluate client contributions based on their current and past
performances. Additionally, an adapted DDQL algorithm is proposed to expedite
the learning process. Experimental results on MNIST and CIFAR-10 datasets have
shown FLASH-RL's effectiveness in achieving a balanced trade-off between model
performance and end-to-end latency against existing solutions. Indeed, FLASH-RL
reduces latency by up to 24.83% compared to FedAVG and 24.67% compared to
FAVOR. It also reduces the training rounds by up to 60.44% compared to FedAVG
and +76% compared to FAVOR. In fall detection using the MobiAct dataset,
FLASH-RL outperforms FedAVG by up to 2.82% in model's performance and reduces
latency by up to 34.75%. Additionally, FLASH-RL achieves the target performance
faster, with up to a 45.32% reduction in training rounds compared to FedAVG.Comment: Accepted in the 41st IEEE International Conference on Computer Design
(ICCD 2023
An MDE Approach for Energy Consumption Estimation in MPSoC Design
International audienceEnergy Consumption is a leading criterion to take into ac- count in the design of multiprocessor systems on chip (MP- SoC). In this paper, we present a solution to estimate the energy consumption early inMPSoC design in order to nd a good performance/energy trade-o in the design ow. This solution is based on the injection of consumption estimators between the hardware components during the co-simulation of a system at the CABA (Cycle Accurate Bit Accurate) level. These estimators are designed using a design frame- work and the corresponding SystemC code is automatically generated thanks to a model driven approach. Our solution oers an energy estimation framework without changing the IP(Intellectual Property)source codes, using standalone es- timation modules, which allows their reuse. The accuracy of this approach is checked by integrating the consumption estimation in the simulation of signicant applications
An Efficient Power Estimation Methodology for Complex RISC Processor-based Platforms
International audienceIn this contribution, we propose an efficient power estima- tion methodology for complex RISC processor-based plat- forms. In this methodology, the Functional Level Power Analysis (FLPA) is used to set up generic power models for the different parts of the system. Then, a simulation framework based on virtual platform is developed to evalu- ate accurately the activities used in the related power mod- els. The combination of the two parts above leads to a het- erogeneous power estimation that gives a better trade-off be- tween accuracy and speed. The usefulness and effectiveness of our proposed methodology is validated through ARM9 and ARM CortexA8 processor designed respectively around the OMAP5912 and OMAP3530 boards. This efficiency and the accuracy of our proposed methodology is evaluated by using a variety of basic programs to complete media bench- marks. Estimated power values are compared to real board measurements for the both ARM940T and ARM CortexA8 architectures. Our obtained power estimation results pro- vide less than 3% of error for ARM940T processor, 3.5% for ARM CortexA8 processor-based system and 1x faster compared to the state-of-the-art power estimation tools
System-Level Power Estimation Methodology for MPSoC based Platforms
Avec l'essor des nouvelles technologies d'intégration sur silicium submicroniques, la consommation de puissance dans les systèmes sur puce multiprocesseur (MPSoC) est devenue un facteur primordial au niveau du flot de conception. La prise en considération de ce facteur clé dès les premières phases de conception, joue un rôle primordial puisqu'elle permet d'augmenter la fiabilité des composants et de réduire le temps d'arrivée sur le marché du produit final.Shifting the design entry point up to the system-level is the most important countermeasure adopted to manage the increasing complexity of Multiprocessor System on Chip (MPSoC). The reason is that decisions taken at this level, early in the design cycle, have the greatest impact on the final design in terms of power and energy efficiency. However, taking decisions at this level is very difficult, since the design space is extremely wide and it has so far been mostly a manual activity. Efficient system-level power estimation tools are therefore necessary to enable proper Design Space Exploration (DSE) based on power/energy and timing.VALENCIENNES-Bib. électronique (596069901) / SudocSudocFranceF
- …